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Abstract — We derive bounds on the noncoherent capacity of a 
very general class of multiple-input multiple-output channels that 
allow for selectivity in time and frequency as well as for spatial cor- 
relation. The bounds apply to peak-constrained inputs; they are 
explicit in the channel's scattering function, are useful for a large 
range of bandwidth, and allow to coarsely identify the capacity- 
optimal combination of bandwidth and number of transmit an- 
tennas. Furthermore, we obtain a closed-form expression for the 
first-order Taylor series expansion of capacity in the limit of in- 
finite bandwidth. From this expression, we conclude that in the 
wideband regime: (i) it is optimal to use only one transmit antenna 
when the channel is spatially uncorrelated; (ii) rank-one statistical 
beamforming is optimal if the channel is spatially correlated; and 
(iii) spatial correlation, be it at the transmitter, the receiver, or 
both, is beneficial. 

Index Terms — Noncoherent capacity, MIMO systems, under- 
spread channels, wideband channels. 



I. Introduction and Summary of Results 

Bandwidth and space are sources of degrees of freedom that 
can be utilized to transmit information over wireless fading 
channels. Channel measurements indicate that an increase in 
the number of degrees of freedom also increases the channel 
uncertainty that the receiver has to resolve [1]. If the transmit 
signal is allowed to be peaky, that is, if it can have an unbounded 
peak value, channel uncertainty is immaterial in the limit of 
infinite bandwidth. Indeed, for a fairly general class of fading 
channels, the capacity of the infinite-bandwidth additive white 
Gaussian noise (AWGN) channel can be achieved [2]-[4]. 

A more realistic modeling assumption is to limit the peak 
power of the transmitted signal. In this case, the capacity be- 
havior of most channels changes drastically: for certain types 
of peak constraints, the capacity can even approach zero in 
the wideband limit [3], [5], [6]. Intuitively, under a peak con- 
straint on the transmit signal, the receiver is no longer able 
to resolve the channel uncertainty as the number of degrees 
of freedom increases. Consequently, questions of significant 
practical relevance are how much bandwidth to use and whether 
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spatial degrees of freedom obtained by multiple antennas can be 
exploited to increase capacity. 

The aim of this paper is to characterize the capacity of spa- 
tially correlated multiple-input multiple-output (MIMO) fading 
channels that are time and frequency selective, i.e., that exhibit 
memory in frequency and time, given that (i) the transmit signal 
has bounded peak power and (ii) the transmitter and the receiver 
know the channel law but both are ignorant of the channel 
realization. The assumptions |n]l constitute the noncoherent 
setting, as opposed to the coherent setting where the receiver 
has perfect channel state information (CSI) and the transmitter 
knows the channel law only. 

Related Work: Sethuraman et al. [7] analyzed the capacity 
of peak-constrained MIMO Rayleigh-fading channels that are 
frequency flat, time selective, and spatially uncorrelated and 
derived an upper bound and a low-SNR lower bound that allow 
to characterize the second-order Taylor series expansion of 
capacity around the point SNR = 0. In particular, it is shown 
in [7] that in the low-SNR regime it is optimal to use only 
a single transmit antenna, while additional receive antennas 
are always beneficial. The low-SNR results also apply to a 
wideband channel with fixed total transmit power and increasing 
bandwidth if the wideband channel can be decomposed into 
a set of independent and identically distributed (i.i.d.) parallel 
subchannels in frequency [7]. 

Spatial correlation is often beneficial in the noncoherent set- 
ting. For the separable (Kronecker) spatial correlation model [8], 
[9], Jafar and Goldsmith [10] proved that transmit correlation 
increases the capacity of a memoryless fading channel. Moreover, 
in the low-SNR regime, the rates achievable with on-off keying 
on memoryless fading channels [11] and with finite-cardinality 
constellations on block-fading channels [12] increase in the 
presence of spatial correlation at the transmitter, the receiver, 
or both. 

Contributions: We consider a point-to-point MIMO channel 
model where each component channel between a given trans- 
mit antenna and a given receive antenna is underspread [13] 
and satisfies the standard wide-sense stationary uncorrelated- 
scattering (WSSUS) assumption [14]; hence, our channel model 
allows for selectivity in time and frequency. We assume that the 
component channels are spatially correlated according to the sep- 
arable correlation model [8], [9] and that they are characterized 
by the same scattering function; furthermore, the transmit signal 
is peak constrained. On the basis of a discrete-time, discrete- 
frequency approximation of said channel model that is enabled 
by the underspread property [15], we obtain the following results: 
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• We derive upper and lower bounds on capacity. These 
bounds are explicit in the channel's scattering function and 
allow to coarsely identify the capacity-optimal combination 
of bandwidth and number of transmit antennas for a fixed 
number of receive antennas. 

• For spatially uncorrected channels, we generalize the 
asymptotic results of Sethuraman et al. [7] to time- and 
frequency-selective channels: for large enough bandwidth — 
or equivalently, for small enough SNR — it is optimal to 
use a single transmit antenna only, while additional receive 
antennas always increase capacity. 

• Differently from the coherent setting [16]— [18], we find 
that both transmit and receive correlation are beneficial 
in the wideband regime. Furthermore, rank-one statistical 
beamforming along the strongest eigenmode of the spatial 
transmit correlation matrix is optimal for large bandwidth. 

As the derivations of the results in the present paper rely 
on several techniques developed in [19] for single-input single- 
output (SISO) time- and frequency-selective channels, we detail 
only the new elements in our derivations and refer to [19] 
otherwise. 

Notation: Uppercase boldface letters denote matrices and 
lowercase boldface letters designate vectors. The superscripts T , 
*, and H stand for transposition, element-wise conjugation, and 
Hermitian transposition, respectively. For two matrices A and B 
of appropriate dimensions, the Hadamard product is denoted 
as AqB and the Kronecker product is denoted as A0B; 
to simplify notation, we use the convention that the ordinary 
matrix product always precedes the Kronecker and Hadamard 
products, e.g., AB C means (AB) C for some matrix C 
of appropriate dimension. We designate the identity matrix 
and the all-zero matrix of dimension N x N by Ijy and 0^, 
respectively; D 1 / 2 is the unique nonnegative definite square-root 
matrix of the nonnegative definite matrix D. The determinant of 
a square matrix X is dct(X), its rank is rank(X), and its trace 
is tr(X). The vector of eigenvalues of X is denoted by A(X), 
We let diag{x} denote a diagonal square matrix whose main 
diagonal contains the elements of the vector x. The function 5(x) 
is the Dirac distribution. All logarithms are to the base e. For 
two functions f(x) and g(x), the notation f(x) = o(g(x)) 
means that lim^^o f(x) /g(x) = 0. If two random variables a 
and b follow the same distribution, we write a ~ b. Finally, we 
denote the expectation operator by E [•] and the Fourier transform 
operator by F[-]. 

II. System Model 

In the following subsections, we first introduce the 
SISO model for one component channel and subsequently 
discuss the extension of this model to the MIMO setting. 

A. Underspread WSSUS Channels 

The relation between the input signal x(t) and the corre- 
sponding output signal y(t) of a SISO stochastic linear time- 
varying (LTV) channel H can be expressed as 

y(t) = (H x) (t) + w(t) = [ ka(t, t')x{t')dt' + w{t) (1) 



where ka(t, t') denotes the random kernel of the channel opera- 
tor H and w(t) is a white Gaussian noise process. We assume 
that ka{t, £') is a zero-mean jointly proper Gaussian (JPG) 
process in t and t' whose Fourier transforms are well defined. 
In particular, L-gft, f) = F T _>/ [ku(t, t — r)] is called the time- 
varying transfer function and S^(y, r) = Ft^> v [ku(t,t — r)] 
is called the spreading function. We assume that the channel 
is WSSUS, so that 

E[Sb(u, r)5£(i/, r')] = C m {y, r)8{v - v')5{r - t'). 

Consequently, the statistical properties of the channel H are 
completely specified through its so-called scattering func- 
tion C-gfv, t). A WSSUS channel is said to be underspread [15] 
if Ch(^, t) is compactly supported on a rectangle [— i/q, i/q] x 
[—To, r ] whose spread Ae = AvqTq satisfies Ah < 1. 

B. Discrete Approximation 

To simplify information-theoretic analysis, we would like to 
diagonalize the channel operator H, i.e., replace the integral 
input-output (IO) relation (|T|l by a countable set of scalar 
10 relations. To this end, we cannot use an eigendecomposition 
of the random kernel kj&(t,t') because its eigenfunctions are 
random as well, and hence unknown to the transmitter and 
the receiver in the noncoherent setting. Yet, for underspread 
channels it is possible to find an orthonormal set of deter- 
ministic approximate eigenfunctions that depend only on the 
channel's scattering function [15]. Consequently, knowledge 
of the channel law — and hence of the scattering function — is 
sufficient for transmitter and receiver to approximately diago- 
nalize H. One possible choice of approximate eigenfunctions is 
the Weyl-Heisenberg set of mutually orthogonal time-frequency 
shifts gk,n{t) = g{t — kT)e t27rnFt of some prototype func- 
tion g(t) that is well localized in time and frequency. The grid 
parameters T and F need to satisfy TF > 1; then, the kernel 
of H can be approximated as [19] 
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l (kT,nF)g k , n (t)g* k Jt'). (2) 



h[k, 



The approximation quality depends on the prototype func- 
tion g(t) and on the parameters T and F, which need to be suit- 
ably chosen with respect to the scattering function C-^iy, r) [15], 
[19]. The eigenvalues of the approximate channel with ker- 
nel ([2| are given by h{k, n] = L^(kT,nF). As the channel 
is JPG and WSSUS, the discretized channel process {h[k, n]} 
is also JPG and stationary in both discrete time k and discrete 
frequency n. We denote its correlation function by R[k, n] = 
E[h[k' + k,n' + n]h* [k', n% normalized as R[0, 0] = 1. The 
associated spectral density 
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can be expressed in terms of the scattering function Ch(^, t) 
as [19] 
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We choose T < l/(2z/ ) and F < l/(2r ) so that no 
aliasing of the scattering function occurs in for this 
choice of T and F, the normalization R[0, 0] — 1 implies 
that J t J i/ CB.{v : r)dvdT = 1. Next, we substitute the approx- 
imation (|2| into {J} and project the input signal x(t) and the 
output signal y(t) onto the Weyl-Heisenberg set {gk,n(t)} to 
obtain the countable set of scalar IO relations 



y[k, n] — h[k, n]x[k, n] + w[k, n] 



one for each time-frequency slot (k,n). The coeffi- 
cients {ui[fc,n]} are i.i.d. JPG with zero mean and variance 
normalized to one. 

C. Extension to Multiple Transmit and Receive Antennas 

We extend the SISO channel model in Q to a MIMO channel 
model with Mt transmit antennas, indexed by q, and Mr receive 
antennas, indexed by r, and assume that all component chan- 
nels are characterized by the same scattering function C-a{v, r) 
so that they are diagonalized by the same Weyl-Heisenberg 
set {gk,n(t)}- For each slot (k, n) and component channel (r, q) 
the resulting scalar channel coefficient is denoted as h r , q \k, n}. 
We arrange the coefficients for a given slot (fc, n) in an Mr x Mt 
matrix H[fc, n] with entries [H[fc, n]] r ,q = h r , q [k, n]. The diago- 
nalized IO relation of the multiantenna channel is then given by 
a countable set of standard MIMO IO relations of the form 



y[k, n] — H[fc, n]x[fc, n] + w[k, 



(5) 



where x[fc,n] = [a; [fc,n] a;i[fc,n] ••• XM T -i[k, n]\ T is the 
Mr-dimensional input vector for each slot (k,n), y[k,n] = 
\yo [k, n] yi [k, n] ■ ■ ■ yM R -i[k, n]\ T is the Af^-dimensional 
output vector, and w[fc, n] is the Mfj-dimensional noise vector^ 
We allow for spatial correlation according to the separable 
correlation model [8], [9], so that 

E [h rA [k' + k,n' + n]h* lql [k' , n']} = B[r, r']A[q, q']R[k, n\. 

The Mt x Mt matrix A with entries [A] g g ' = A[q, q'] is called 
the transmit correlation matrix, and the Mr x Mr matrix B, 
with entries [B] r>r < = B[r, r'], is the receive correlation matrix. 
Consequently, 

H[fc, n] = B^ 2 H w [k, n]{A 1 / 2 ) T (6) 

where H w [fc, n] is an Mr x Mt matrix with i.i.d. JPG entries 
of zero mean and unit variance for all (k, n). We normalize A 
and B so that tr(A) = M T and tr(B) = Mr. 

D. Matrix-Vector Formulation of the Discretized Input-Output 
Relation 

We define a channel use as a K x N rectangle of time- 
frequency slots and stack the symbols {x g [fc,n]} transmit- 
ted from all Mt transmit antennas during one channel use 
into an AfT^^V-dimensional vector x, the corresponding out- 
put {y r [k, n]} for all Mr receive antennas into an MrKN- 
dimensional vector y, and likewise the noise {w r [k, n]} into an 

' To distinguish quantities that pertain to the MIMO IO relation for an indi- 
vidual slot (k,n) from the corresponding quantities of the joint time-frequency- 
space IO relation |8j to be introduced in the next subsection, we use a sans-serif 
font for the former quantities. 



Affji^iV-dimensional vector w. Stacking proceeds first along 
frequency, then along time, and finally along space, as shown 
exemplarily for the input vector x: 



x,[fc] = [x q [k,0}x q [k,l] ••• x q [k,N-l]] T (7a) 
x,= [x^[0]x^[l] •••x^-l]] T (7b) 
x=[xjxf •••x^ x _ 1 ] T . (7c) 



(4) 

Analogously, we stack the channel coefficients, first in frequency 



to obtain the vectors h r g [fc], and then in time to obtain a vec- 
tor h r q for each component channel (r,q); further stacking of 
these vectors along transmit antennas q and then along receive 
antennas r results in the MtMrKN -dimensional vector h. 
Let X g = diag{x 9 } and X = [Xq Xi • ■ ■ X^/ T _i], where the 



vectors x 9 are defined in (7b i. With this notation, the IO relation 
for one channel use can be conveniently expressed as 



y = (lAf R <x>X)h + w. 



(8) 



The distribution of the channel coefficients in a given channel use 
is completely characterized by the MtMrKN x MtMrKN 
correlation matrix 
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(9) 



where the correlation matrix R = E[h r , g h^ ] is the same for 
all component channels (r, q) by assumption; R is two-level 
Toeplitz, i.e., block-Toeplitz with Toeplitz blocks. We assume 
that the three matrices A, B, and R are known to the transmitter 
and the receiver. 

E. Power Constraints 

We impose a constraint on the average power of the transmitted 
signal per channel use such thatE [||x|| 2 ] /T < KP. In addition, 
we assume a peak constraint across transmit antennas in each 
slot (k, n) according to: 
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[3P 
N 



(10) 



with probability 1 (w.p.l). Here, j3 > 1 is the peak- to average- 
power ratio (PAPR). 

F. Spatially Decorrelated Input-Output Relation 

Before proceeding to analyze the capacity of the channel 
just introduced, we make one more cosmetic change to the 
IO relation (|8]), which simplifies the exposition of our results con- 
siderably. For each slot, we express the input and output vectors 
in the coordinate systems defined by the eigendecomposition 
of the transmit and receive correlation matrices, respectively. 
A similar transformation is used in [10], [12] for a frequency- 
flat block-fading spatially correlated MIMO channel. Let the 
eigendecomposition of the spatial correlation matrices be 



A = U A SU 



A • 



B = U B AUf , 

where £ = diag{[<7o u\ • • • cxm t -i] T } contains the eigenval- 
ues {o-q} of A, ordered according to ctq > <y\ > ■ ■ ■ > a^[ T 



and, similarly, A = diag{[ArjAi 



A 



Afp-l 



} contains the 



eigenvalues {A r } of B, ordered according to Aq > Ai > ■ • • > 
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Amjj-i- The columns of U4 are called the transmit eigenmodes 
and the columns of are the receive eigenmodes. Instead of the 
vectors x[fc, n] and y[k, n], we use the rotated vectors Ujx[fc, n] 
and Uf y[fc, n], respectively, to obtain the following spatially 
decorrelated IO relation in each slot (k, n): 

Uf y[k,n] = Uf H[fc,n]x[ft,n] + Uf w[fc,n] 

^ Uf (UflA^uf) H w [fe, n] (U^S^Uf ) T x[fc, n] 

+ Uf w[fc, n] 
= A^ 2 Uf H w [fc, r^E^UjxIfc, n] + Uf w[fc, »] 

(12) 

where (a) follows from d6]|. Rotations are unitary operations; 
therefore, Uf H w [/c,n]U^ ~ H w [fc,rt] and Ufw[fc, n] ~ 
w[fc, ra]. Furthermore, rotations preserve norms, so that the ro- 
tated input vector Ujx[fe, n] satisfies the same power constraints 
as the unrotated input vector x[fc, n]. Finally, Uf y[k, n] is a 
sufficient statistic for the output vector y [k, n]. These three prop- 
erties imply that the capacity of the channel with input x[k, n] 
and output y[k, n] in |5]) is the same as the capacity of the 
spatially decorrelated channel A 1 ' 2 H w [fe, njX 1 / 2 in ([12} with 
input Ujx[fc, n] and output Uf y[k, n]. In the new coordinate 
system, q indexes transmit eigenmodes instead of transmit an- 
tennas, and r indexes receive eigenmodes instead of receive 
antennas. 

It is now tedious but straightforward to similarly rotate the 
stacked IO relation ((8). To keep notation simple, we chose not to 
introduce new symbols for the rotated input and output and for 
the spatially decorrelated channel; from here on, all inputs and 
outputs are with respect to the rotated coordinate systems, and 
the channel vector h now stands for the spatially decorrelated 
stacked channel with correlation matrix 



E[hh H ] = A(g)S(g)R. 



(13) 



This correlation matrix is block diagonal, and hence of much 
simpler structure than 

G. Advantages and Limitations of the Model 

The channel model just presented is fairly general: it allows 
for correlation in space and for selectivity in time and frequency. 
Hence, we can dispense with the often used block-fading assump- 
tion in time and with the assumption of independent subchannels 
in frequency. Fortunately, the generality of our model does 
not come at the price of high modeling complexity as only 
the scattering function and the spatial correlation matrices A 
and B are needed to describe the distribution of the channel 
coefficients {h r Jk, n]}. Both the scattering function and the 
spatial correlation matrices can be obtained from channel meas- 
urements [20], [21], [9], so that the model can be directly related 
to real- world channels. 

Modeling is synonymous with making assumptions and sim- 
plifications. We briefly discuss and justify our key assumptions. 
• The assumption that transmitter and receiver do not know 
the channel realization is accurate, as in a practical system 
channel realizations can only be inferred from the received 
signal. The rates achievable with specific methods to ob- 
tain CSI, like training schemes, cannot exceed the capacity 
of the channel in the noncoherent setting. 



• Virtually all wireless channels are highly underspread: 
extremely dispersive outdoor channels with fast moving 
terminals may have a spread of Ah ~ 1CP 2 , while for 
slowly varying indoor channels typically Ah ~ 10 -7 . 

• The Weyl-Ffeisenberg transmission set {gk,n(t)} can be 
interpreted as pulse-shaped (PS) orthogonal frequency- 
division multiplexing (OFDM); hence, the model we use 
in our information-theoretic analysis is directly related to a 
practical transmission scheme. 

• We neglect the error incurred by the approximation of the 
kernel ku(t, t') in (|2J, which is equivalent to neglecting in- 
tersymbol and intercarrier interference in the corresponding 
PS -OFDM system interpretation [19]. Yet, if the pulse g(t) 
and T and F are chosen so as to optimally mitigate intersym- 
bol and intercarrier interference, i.e., if they are matched to 
the channel's scattering function [15], [22], [19], we conjec- 
ture that the resulting approximation error in (|2]) is smaller 
than the corresponding error incurred if either conventional 
cyclic prefix OFDM or direct sampling of ka(t, t 1 ) and 
truncation of the resulting sample sequence (e.g., see [5]) 
is used to analyze underspread WSSUS channels. In fact, 
these last two decompositions are, in general, not matched 
to the channel's scattering function. 

« The scattering function models small-scale fading, i.e., the 
statistical variation of the channel as transmitter, receiver, 
or objects in the propagation environment are displaced 
by a few wavelengths [23]. Therefore, if the antennas at 
each terminal are spaced only a few wavelengths apart, the 
component channels may be well modeled by the same 
scattering function. 

• We assume that the component channels are spatially cor- 
related according to the separable correlation model [8], 
[9]. This assumption is common in theoretical analyses 
of MIMO channels because it greatly simplifies analytical 
developments. Shortcomings of this model are discussed 
in [24], [25]. 

• We assume that spatial correlation does not change over 
time and frequency. This assumption is valid only over 
a limited time duration and bandwidth, as it requires the 
antenna patterns to be constant over frequency and the 
configuration of dominant scattering clusters to be constant 
over time. 

• The constraint on the peak power across antennas is a 
reasonable model for a regulatory limit on the total isotropic 
radiated peak power. If the peak limitation arises from the 
power amplifiers in the individual transmit chains, a peak 
constraint per antenna should be used instead. 

III. Capacity Bounds 

With the system model and power constraints in place, we 
can now proceed to evaluate upper and lower bounds on the 
capacity of the channel with IO relation d8). Although all results 



to follow pertain to the channel model described in Section II-D 



under the power constraints in Section |Tj-E[ we use the spatially 
decorrelated channel and the rotated input and output vectors 



introduced in Section II-F to simplify the exposition of the 
proofs. 
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As we assume that for all (r, q) the process {h rA [k, n)} has 
a spectral density, given in (J5), {ft, r <? [fc, n]} is ergodic in k for 
all component channels [26], and the capacity is given by [27, 
Chapter 12] 

C{W)= Urn J- SU p/(y;x) (14) 

for any fixed bandwidth ly = iVF. The supremum is taken over 
the set V of all input distributions that satisfy the constraints on 



peak and average power in Section II-E 



A. Upper Bound 

Theorem 1: The capacity of the underspread WSSUS 
MIMO channel in Section |II-D | under the power constraints in 



Here, (a) follows from the assumption that (1m R <8 X)h is JPG 
distributed, from the block diagonal structure of its correlation 
matrix, and because X(E ® R)X H = Y^q 1 cr g X 9 RXf = 
12q I =o~ 1 <T 9 x g x ^ © Hadamard's inequality and the normal- 
ization i?[0, 0] = 1 give (b); finally, (c) follows from Jensen's 
inequality. 

The derivation of a lower bound on inigi I(y; h | x) is more 
involved. Our proof is similar to the proof of the corresponding 
SISO result in [19, Theorem 1]; therefore, we highlight the novel 
steps only: 

inf/(y;h|x) 



Section II-E|is upper-bounded as C(W) < U\(W), where 
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C H (f, t) ) dvdr. 
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Proof: Let Q be the set of input distributions that satisfy 
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ooEffo lE [ll x 9ll 2 ] = °oE[||x|| 2 ], any input distribution 
that satisfies the average-power constraint E [||x|| 2 ] /T < KP 
also satisfies (16) , so that C 2. To upper-bound C^W 7 ), we 
replace the supremum over V in ( 14 1 with a supremum over Q 
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and then use the chain rule for mutual information and split the 
supremum over Q: 

sup/(y;x) < sup/(y;x) 
v Q 

< sup |sup/(y;x,h) - inf J(y;h|x)} (17) 

0<a<cr o L Q\ a QIq ' 

where the distributions in the restricted set Q\ a satisfy the 

equality constraint E[X;qlV lcr 9ll x 9ll 2 ] = otKPT and the 
peak constraint ( |10| l. 

To upper-bound supq| Q J(y;x, h), we drop the peak con- 
straint and take (1m a <8>X)h as JPG distributed with block- 
diagonal correlation matrix A <g> E [X(S <g) R)X ff ] . Then, 

/(y;x,h) 

(a) Mr ^ 1 M t -1 

< logdet(l K AT + A r j2 a i E t x 9 x f ] R ) 

r=0 9=0 
r b \ Mr-1 N—l K-l M t -1 

^ E EE lo s( 1 + A - E <r q ®[\x q [k,n]\ 2 ] 

r=0 n=0 fe=0 9=0 

- KN E lo § 1 + ^r 

(18) 



Here, 

of A®X(£ 
divide by 



(a) follows from 



>R)X H ; 



to 



EM T -1 
9 = " 



the block-diagonal structure 
obtain (b), we multiply and 
, and to get (c) we replace 



the first factor in the expectation 
all input vectors that satisfy the 
(d) follows because E [J^f^ -1 cr g ||x g 
because det(lAr + A B) > det(Ijv -+ 



by its infimum over 
peak constraint (lOi; 

x„|| 2 ] = aKPT and 
(Ijv© A)B) for two 



N x N nonnegative definite matrices A and B — a determinant 
inequality that we prove in Appendix [Aj finally, (e) is a 
consequence [19, Appendix B] of the relation between mutual 
information and minimum mean square estimation error [28]. 
To conclude the proof, we note that the bounds on both terms 
on the right-hand side (RHS) of ([TTJl no longer depend on K 
upon division by KT. ■ 
1 ) The Supremum ofUi(W) : As the value of a that achieves 
the supremum in ( |15a| > depends on W in general, the upper 
bound Ui(W) is difficult to interpret. However, for the special 
case that the supremum is attained for a = oo independently 
of W, the upper bound can be interpreted as the capacity of a 
set of Mji parallel AWGN channels with received power cr A r P 
and W/ (TF) degrees of freedom per second, minus a penalty 
term that quantifies the capacity loss because of channel uncer- 



6 



tainty. We show in Appendix |B] that a sufficient condition for 
the supremum in (|15a|) to be achieved for a = <tq is 



(19a) 



B. Lower Bound 



A H < ft/(3TF) 



and 



P A H 

< — < — 

~ W (ToAo/3 



exp 



ft 



2TFAit 



- 1 



(19b) 



As virtually all wireless channels are highly underspread, as ft > 
1 and, typically, TF ss 1.25, condition (|19a|i is always satisfied, 



so that the only relevant condition is ( 19b i; but even for large 
channel spread, this condition holds for all SNR values P/W of 
practical interest. As an example, consider a system with ft = 1, 
and Mt = Mr = 4 that operates over a channel with 
spread Ah = 1CP 2 . If we use the upper bound eroA < 
MrMt, which follows from the normalization tr(A) = Mt 
and tr(B) = M R , we find fro m p9] > that P/W < 141 dB is 
sufficient for the supremum in ( |15a| > to be achieved for a = <jq. 
This value is far in excess of the receive SNR encountered 
in practical systems. Therefore, we exclusively consider the 
case a = Co in the remainder of the paper. 

2) The Penalty Term: What we call the "penalty 
term", i.e., anT.TJT GJW) in (15i, is a lower bound 



on inf gi I(y; h | x). For SISO channels, it is shown in [19] 
that of all unit-volume scattering functions with prescribed v$ 
and To, the brick-shaped scattering function, Ch(V, t) = 1/Ah 
for (v, t) £ [—Vqi vq\ x [—To, r ], results in the largest penalty 
term. The same is true for the MIMO channel at hand, where 
the corresponding capacity is upper-bounded as 



C(W)< 



r^f W . A . PTF\ 

r=0 k v 7 

WA n ( RP 
log 1 + a X r 



ft 



(20) 



The upper bound $20) depends on the channel spread Ah and 
the PAPR ft only through their ratio, so that a decrease in Ah 
has the same effect on the upper bound as an increase in the 
PAPR ft of the input signal. 

3) Spatial Correlation and Number of Antennas: The upper 
bound U\(W) depends on the transmit correlation matrix A 
only through its maximum eigenvalue a , which plays the role 
of a power gain. This observation shows that rank-one statistical 
beamforming along any eigenvector of A corresponding to ao is 
optimal whenever U\ (W) is tight. At high P/W and correspond- 
ingly small bandwidth, Ui(W) increases linearly in the number 
of nonzero eigenvalues of the receive correlation matrix, that is, 
in rank(B). As the capacity in the coherent setting, which is a 
simple upper bound on C(W), increases at high P/W linearly 
only in the minimum of rank(A) and rank(B) [17, Proposition 
4], we conclude that U\ (W) is not tight at high P/W. However, 
for large bandwidth and corresponding small P/W, we show 



in Section IV that U\(W) is tight and that rank-one statistical 



Theorem 2: Let C(6) denote the N x N matrix -valued spec- 
tral density of an arbitrary component channej^]{h[fc]}, i.e., 



C(0) = K[h[k' + k]h H [k'}] 



-i2-irk6 



\e\<- 2 . 



Furthermore, let s denote an Mt -dimensional vector whose 
first Q elements are i.i.d. and of constant modulus — they have 
zero mean and satisfy |[s] ? | = PT/(QN) — and let the remain- 
ing Mt—Q elements be zero. Let H w be an Mr x Mt matrix and 
let w be an M^-dimensional vector, both with i.i.d. JPG entries 
of zero mean and unit variance. Finally, denote by I(y; s | H w ) 
the coherent mutual information of the memoryless fading 
MIMO channel with 10 relation y = A 1 / 2 H w S 1 / 2 s + w. Then, 
the capaci ty (fT4| ) of the underspread WSSUS MIMO channel 
in Section |II-D| under the power constraints in Section II-E is 
lower-bounded as C(W) > max^ <q< m t L±(W, Q), where 

f W 

Lt(W,Q) = max -W(y; Vl* H w ) 

l<7</3 I 7-1 -T 
, Q-IMr-1 1/ f 2 , v 

7 YEE J logdet^ + ^A^C^J^ . 



q=0 r=0 



-1/2 



(21) 



Proof: Any specific input distribution leads to a lower 
bound on capacity; in particular, we choose to transmit con- 
stant modulus symbols x g [k, n] — s q [k,n] that are i.i.d. over 
time, frequency, and eigenmodes, and that satisfy \s q [k, n)\ 2 = 
PT/{QN) w.p.l for all k,n and for q = 0,1,. .. ,Q - 1. 
The remaining Mt — Q eigenmodes are not used to transmit 
information. We stack the symbols s q [k, n] as in (|7ji and define 
the KN x M T KN matrix 

S = [So Si • • ■ Sq_i Okn ■ ■ ■ Okn] 

with S q — diag{s g } and where the last Mt — Q entries are 
all-zero matrices Okn- Next, we use 

7(y; S )>7(y; S |h)-7(y;h|s) (22) 

and bound the two terms on the RHS of |22| separately. Because 
the input is i.i.d., I(y; s | h) = KN I(y; s | H w ). The second 
term on the RHS of |22} can be evaluated as 

Mr-1 

/(y;h|s)= J2 E[logdet(l K 7v + A r S(S0R)S H )] 

(a) ^l^ 1 

< logdet(l Q K Ar + A I .E[S ff S](E®R)) 

r=0 

logdetfl K Ar + cr 9 A r — R 

q=0 r=0 ^ ^ 



(b) 



beamforming is indeed optimal in the wideband regime. 



where (a) follows from Jensen's inequality because the log- 
determinant expression is concave in S H S [29], and (b) follows 

2 The vector processes h r ^ q [k] of all component channels (r, q) have the same 
spectral density by assumption; therefore, we drop the subscripts r and q. 
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because the {s q [k, n] } are i.i.d. and have zero mean and constant 
modulus |s q [/c, n] | 2 = PT/(QN). We combine the two terms, 
set W = NF, divide by KT, and evaluate the limit for K — > oo 
by means of [30, Theorem 3.4], a generalization of Szego's theo- 
rem for multilevel Toeplitz matrices. The resulting lower bound 
can then be improved upon via time sharing: Let 1 < 7 < j3. 
We transmit during a fraction 1/7 of the transmission time 
and let the transmitter be silent otherwise. ■ 
Wideband Approximation of the Lower Bound: For large 
enough bandwidth, and hence large enough N, the lower bound 
in Theorem[2]can be well approximated by an expression that 
is often much easier to evaluate: (i) We replace the first term 
of L 1 (W,Q) by its Taylor series expansion up to first order, 
as given in [31, Theorem 3]. This expansion requires the com- 
putation of the expectation of the trace of several terms that 
involve the channel matrix A 1 / 2 H W S 1 / 2 . Lemmas 3 and 4 
in [32] provide the desired result, (ii) An approximation of 
the second term results if we replace the N x N Toeplitz 
matrix C(9) by a circulant matrix that is, in N, asymptotically 
equivalent to C(0) [19]. The resulting wideband approximation 
of Li(W, Q) then reads 



L a (W,Q)= max 



M R P 



l<7</3 



q=0 



IP 



TF \lsq=0 °<l) l^r=0 A r + M R 2~iq=0 q 



w 



2Q 2 



y S ]C J J lo S i 1 + u i X r^Cu{v, t) ) dvdr 



(23) 



This approximation is exact for W — > 00 [19]. 



C. Numerical Examples 

For a 3 x 3 MIMO system, we show in this section plots of 
the upper bound U\(W) of Theorem[T| and — for Q between 1 
and 3 — plots of the lower bound L\ [W, Q) of Theorem[2]and 
of the corresponding approximation L a (W,Q) in ( f23| >. The 
large-bandwidth behavior of the bounds will be substantiated 
in Section |IV] 

Numerical Evaluation of the Lower Bound: While the upper 
bound Ui(W) for a = a$ can be efficiently evaluated, direct 
numerical evaluation of the lower bound Li(W, Q) is difficult 
for large N . First, it is necessary to numerically compute the 
mutual information I(y; ./7s | H w ) for constant modulus inputs; 
second, the eigenvalues of the N x N matrix C(9) are required 
for the evaluation of the penalty term in < |21) . While efficient 
numerical algorithms exist to solve the first task [33], the second 
task is often challenging, especially if N is large. In [19], we 
present upper and lower bounds on the penalty term in ( f2T| 
that are more amenable to numerical evaluation. For the set of 
parameters considered in the next subsection, these bounds are 
tight and allow to fully characterize Li(W, Q) numerically. 

Parameter Settings: All plots are for a receive power normal- 
ized with respect to the noise spectral density of P/(l W/Hz) = 
1.26 • 10 s s~\ This parameter value corresponds, for example, 




1.0 10 
bandwidth [GHz] 



Fig. 1 . Upper and lower bounds on the capacity of a spatially uncorrelated 
underspread WSSUS channel with S = A = I3, M T = M R = 3, f3 = 1, 
and A H = 10~ 3 . 



to a transmit power of 0.5 mW, a thermal noise level at the 
receiver of — 174dBm/Hz, free-space path loss over a distance 
of 10 m, and a rather conservative receiver noise figure of 20 dB. 
Furthermore, we assume that the scattering function is brick 
shaped with tq — 5 us, vq — 50 Hz, and corresponding 
spread An = 10~ 3 . Finally, we set (3=1. For this set of 
parameter values, we analyze three different scenarios: a spatially 
uncorrelated channel, spatial correlation at the receiver only, and 
spatial correlation at the transmitter only. 

1) Spatially Uncorrelated Channel: Fig. [T] shows the 
upper bound U\(W) and — for Q between 1 and 3 — the 
lower bound L 1 (W,Q) and the corresponding approxima- 
tion L a (W, Q) for the spatially uncorrelated case X = A = 
I3. For comparison, we also plot a standard capacity upper 
bound U C (W) obtained for the coherent setting and with input 
subject to an average-power constraint only. We can observe 
that U C (W) is tighter than U^W) for small bandwidth; this 
holds true in general as for small W the penalty term in ( fT3| 
can be neglected and Ui(W) in the spatially uncorrelated case 
reduces to 

rfm MrW ( PTF 
Ui(W) w — log( 1 



TF V W 

which is the Jensen upper bound on the capacity U C (W) in the 
coherent setting. For small and medium bandwidth, the lower 
bound Li(W, Q) increases with Q and comes surprisingly close 
to the coherent capacity upper bound U C (W) for Q = 3. 

As can be expected in the light of e.g., [5], [6], when band- 
width increases above a certain critical bandwidth, both Ui(W) 
and Li(W, Q) start to decrease; in this regime, the rate gain 
resulting from the additional degrees of freedom is offset by 
the resources required to resolve channel uncertainty. The same 
argument seems to hold in the wideband regime for the degrees of 
freedom provided by multiple transmit antennas: Ui(W) appears 
to match Li(W, Q) for Q = 1; hence, using a single transmit 
antenna seems optimal in the wideband regime. 

2) Impact of Receive Correlation: Fig. [2] shows the same 
bounds as before, but evaluated with spatial correlation A = 
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0.01 0.1 1.0 10 100 1000 



bandwidth [GHz] 



Fig. 2. Upper and lower bounds on the capacity of an underspread WSSUS 
channel that is spatially uncorrelated at the transmitter, S = I3, but correlated 
at the receiver with A = diag{[2.6 0.3 0.1] T }; M T = M R = 3, /? = 1, 
andA H = lCr 3 . 




bandwidth [GHz] 



Fig. 3. Upper and lower bounds on the capacity of an underspread WSSUS 
channel that is correlated at the transmitter with X = diag{[1.7 1.0 0.3] T } 
and spatially uncorrelated at the receiver, A = I3; Mt = Mr = 3, j9 = 1, 
and A H = 10~ 3 . 



diag{[2.6 0.3 0.1] T } atthe receiver and a spatially uncorrelated 
channel at the transmitter, i.e., S = I3. The curves in Fig. [2] are 
very similar to the ones shown in Fig.[T]for the spatially uncorre- 
lated case, yet they are shifted towards higher bandwidth while 
the maximum rate is lower. Hence, at least for the example at 
hand, receive correlation decreases capacity at small bandwidth 
but it is beneficial at large bandwidth. 

3) Impact of Transmit Correlation: We evaluate the same 
bounds once more, but this time for spatial correlation £ = 
diag{[1.7 1.0 0.3] T } at the transmitter and a spatially uncorre- 
lated channel at the receiver, i.e., A = I3. The corresponding 
curves are shown in Fig. [3] Here, transmit correlation increases 
the capacity at large bandwidth, while its impact at small band- 
width is more difficult to judge because the distance between 
upper and lower bound increases compared to the spatially 
uncorrelated case. 

All three figures show that for large bandwidth the approxima- 
tion L a (W, Q) of L\(W, Q) is quite accurate. An observation 
of significant practical importance is that the bounds U\(W) 
and Li(W, Q) are quite flat over a large range of bandwidth 
around their maxima. Further numerical results point at the 
following: (i) for smaller values of the channel spread Ah, these 
maxima broaden and extend towards higher bandwidth; (ii) an 
increase in (3 increases the gap between upper and lower bounds. 



IV. The Wideband Regime 



Theorem 3: Define 



The numerical results in Section III-C suggest that in the 



wideband regime (i) using a single transmit antenna is optimal 
when the channel is spatially uncorrelated at the transmitter side; 
(ii) it is optimal to signal over the maximum transmit eigenmode 
if transmit correlation is present; (iii) both transmit and receive 
correlation are beneficial. To substantiate these observations, 
we compute the first-order Taylor series expansion of C(W) 
around 1/W = 0. 



ChO, r)dvdT, 



and 



Mr-1 



\l. (24) 



Then, for /3 > 2TF/n^, the capaci ty (|14| l of the underspread 
WSSUS MIMO channel in Section |II-D| under the power con- 



straints in Section |H-E| has the following first-order Taylor series 
expansion around 1/W = 0: 



C{W) 



a 

W 



where 



(f3 KM - TF) . 



(25a) 



(25b) 



Proof: The proof is a generalization of a similar proof for 
SISO channels in [19, Appendices E and G]; therefore, we only 
sketch the main steps. 

First, we expand the upper bound in Theorem [T] into a Taylor 
series. If the channel is highly underspread, the sufficient condi- 



tion ( 19 1 for a = Co to achieve the supremum in ( 15a 1 is valid 
for large enough bandwidth and hence for W — > 00. Therefore, 
we only need to expand U\(W) for a = q-q. A more refined 



analysis in [19, Appendix E] shows that the supremum in ( |15a| i 
is achieved for a — ctq in the large-bandwidth regime if and 
only if j3 > 2TF/k^, a condition less restrictive than ( |19a| . 

It follows from [19, Appendix F] that a Taylor series ex- 
pansion of the lower bound Li(W, Q) in Theorem [2] does not 
match the corresponding expansion of U\(W) up to first order, 
so that we need to devise an alternative, asymptotically tight, 



lower bound. We observed in Section III-C that signaling over 
a single transmit eigenmode seems to be optimal for large 
bandwidth; hence, it is sensible to base the asymptotic lower 
bound on a signaling scheme that uses only the strongest transmit 
eigenmode in each slot. In one channel use, we thus trans- 
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mij^jx = [xg OJcn ' ' ' ®knY '> where xo stands for the input 
vector transmitted on the strongest eigenmode. Such a signaling 
scheme, often referred to as rank-one statistical beamforming, 
transmits over all available antennas in general; only if the chan- 
nel is spatially uncorrelated at the transmitter can antennas be 
physically switched off. With rank-one statistical beamforming, 
the spatially decorrelated MIMO channel with IO relation <JSj 
simplifies to a single-input multiple-output (SIMO) channel, the 
IO relation of which can be conveniently expressed as 

y = h0x + w 

where w is an Af^i^iV-dimensional JPG vector with i.i.d. en- 
tries of zero mean and unit variance, the input vector x = 
[x.q ■ ■ ■ Xq) t contains Mr copies of Xq, and the stacked effec- 
tive SIMO channel vector is h = [h^ h^ 

correlation matrix R^ = E hh ff 



n M R - 



with 



l.OJ 

= o"o A (8 R. The desired 
asymptotic lower bound now follows directly from the derivation 
of the asymptotic lower bound for a time-frequency selective 
SISO channel in [19, Appendix G]. In particular, we choose xo to 
be the product of a vector with i.i.d. zero mean constant modulus 
entries and a nonnegative binary random variable with on-off 
distribution. Similar signaling schemes were already used in [7] 
to prove asymptotic capacity results for frequency-flat, time- 
selective channels. As the first-order Taylor expansion of the 
resulting lower bound matches the first-order Taylor expansion 
of Ui(W) in p5a}, Theorem[3lfollows. ■ 
Spatial Correlation and Number of Antennas: Rank-one statis- 
tical beamforming along any eigenvector of A associated with oo 
is optimal to attain the wideband asymptotes of Theorem [3] For 
channels that are spatially uncorrelated at the transmitter, this 
result implies that using only one transmit antenna is optimal, 
as previously shown in [7] for the frequency -flat time-selective 
case. To further assess the impact of correlation on capacity, we 
follow [8], [10], [16] and define a partial ordering of correlation 
matrices through majorization [34]. We say that a correlation 
matrix K entails more correlation than a correlation matrix C if 
the vector of eigenvalues A(K) majorizes A(C). To assess the 
impact of spatial correlation on capacity, we further need the 
following definition [34]: a scalar function 0(z) of a vector z is 
Schur concave if cf>(z) < <j>(q) whenever z majorizes q. 

In the coherent setting, capacity is Schur concave in A(B), 
the eigenvalue vector of the receive correlation matrix while, 
for sufficiently large bandwidth, it is Schur convex in A(A), the 
eigenvalue vector of the transmit correlation matrix [17], [16]. 
Hence, in the coherent setting receive correlation is detrimental 
at any bandwidth while transmit correlation is beneficial at large 
bandwidth. The intuition is that transmit correlation allows to 
focus the transmit power into the maximum transmit eigenmode, 
and the corresponding power gain offsets the reduction in ef- 
fective transmit signal space dimensions in the power-limited 
regime, i.e., at large bandwidth. On the other hand, receive 
correlation is detrimental at any bandwidth because it reduces 
the effective dimensionality of the receive signal space without 
any power gain [18]. 

3 Differently from the coherent setting [17, Proposition 3], the multiplicity 
of the largest eigenvalue of A is immaterial. If this multiplicity is larger than 
one, we choose to transmit along the eigenvector corresponding to index q = 
merely for notational simplicity. 



On the basis of Theorem [3] we conclude that the picture is 
fundamentally different in the noncoherent setting. The coeffi- 
cient a in ( p5) is a Schur-convex function in both the eigenvalue 
vector [do fi ■•• <?M T -i] of the transmit correlation matrix 
and the eigenvalue vector [Ao Ai • • • \m r -i] of the receive 
correlation matrix because cro and 6 are continuous convex func- 
tions of the corresponding eigenvalue vectors [35]. Hence, both 
receive and transmit correlation are beneficial for sufficiently 
large bandwidth. This observation agrees with the results for 
memoryless and block-fading channels reported in [10]— [12]. In 
the wideband regime, while transmit correlation is beneficial in 
both the coherent and the noncoherent setting because it allows 
for power focusing, receive correlation is beneficial rather than 
detrimental in the noncoherent setting for the following reason: 
for fixed My and Mr, the rate gain obtained from additional 
bandwidth is offset in the wideband regime by the corresponding 
increase in channel uncertainty (see Figs.[T[|2] and[3]l; yet, for 
fixed but large bandwidth, channel uncertainty decreases in the 
presence of receive correlation so that capacity increases. 

The Lower Bound Li(W, Q) in the Wideband Regime: Since 
we know that the first-order Taylor expansions around (1 /W) = 
of the upper bound U\{W) and the lower bound Li(W, Q) do 
not match, it is surprising that the corresponding curves seem 
to coincide in Figs.[T[|2j and[3]for large bandwidth. The reason 
is that, for typical values of TF and j3, the ratio between the 
first-order coefficients in the Taylor expansions of Li(W,Q) 
and C(W) approaches 1 as /% grows large and Q = 1. For ex- 
ample, the ratio is 0.998 for the parameters used in the numerical 
evaluation in Fig. [I] i.e., A H = 10" 3 , /3 = 1, and TF = 1.25. 

V. Discussion and Outlook 

Capacity analysis in the noncoherent setting is frequently 
performed asymptotically for either large or small SNR, P/W. 
The corresponding asymptotic results are often useful to obtain 
design insight, but they may sometimes be misleading: capacity 
behavior is very sensitive to specific details of the channel model 
used at high SNR [36], and any channel model eventually breaks 
down for large enough bandwidth and correspondingly low SNR. 
The capacity bounds in the present paper are useful for a large 
range of bandwidth in between these two asymptotic cases (in 
addition, they are tight in the wideband regime). 

The discrete-time discrete-frequency channel model presented 
in II-B and Section II-C is very general; at the same time, the cor- 



responding capacity bounds in Section [III] are relatively simple 
for practically relevant values of P/W and for realistic scattering 
functions. Furthermore, as our discrete-time discrete-frequency 
channel model is related to the continuous time WSSUS channel 
model |T]), results from real-world channel measurements can 
be directly used to obtain capacity estimates. In particular, as 
the bounds hold for both the regime where degrees of freedom 
increase capacity, as well as for the regime where degrees of 
freedom are detrimental, they allow to numerically determine 
a suitable combination of bandwidth and number of transmit 
antennas. 

For large bandwidth, the bounds are very accurate — the up- 
per bound Ui(W) exhibits the correct asymptotic behavior 
for W — > oo, as shown in Section [Tv| For small and medium 
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bandwidth, the upper bound U\(W) is not tight, and is indeed 
worse than the coherent capacity upper bound. The fact that our 
simple lower bound L\ (W, Q) comes quite close to the coherent 
upper bound U C (W) in Fig. [T| seems to validate, at least for 
the setting considered, the standard receiver design principle to 
first estimate the channel and then use the resulting estimates 
as if they were perfect. To verify this conjecture, though, it is 
necessary to show that the combination of dedicated channel 
estimation and coherent signaling achieves rates similar to those 
predicted by the lower bound L\(W, Q). 

The advent of ultrawideband (UWB) communication systems 
spurred the current interest in wireless communications over 
channels with very large bandwidth. Current UWB regulations 
impose a limit on the power spectral density of the transmitted 
signal, so that the available average power increases with in- 
creasing transmission bandwidth. In contrast, we keep the total 
average transmit power fixed in the present paper; therefore, the 
results presented here do not directly apply to current UWB regu- 
lations. Nonetheless, our bounds allow to assess whether multiple 
antennas at the transmitter are beneficial for UWB systems. The 
system parameters used to numerically evaluate the bounds in 



III-C are compatible with a UWB system that operates over a 



bandwidth of 7 GHz and transmits at -41.3 dBm/MHz. Even if 
our bounds are not tight at 7 GHz in this scenario, Figs. [T] [2] 
and [3] show that the maximum rate increase that can be expected 
from the use of multiple antennas at the transmitter does not 
exceed 7%. For channels with smaller spreads than the one 



in Section III-C the possible rate increase is even smaller. 



Appendix A 
A Determinant Inequality 

Lemma 4: Let A and B be two N x N nonnegative definite 
Hermitian matrices. Then, 

det(Ijv + A B) > det(liv + (Ijv0A)B). 

Proof: Assume for now that A does not have zeros on its 
main diagonal and define A = (I at A) -1 . Then, 

det(Ijv + A B) = det(A ©(A + B)) 

(») 

> det(Ijy A) det(A + B) ^6) 
= det((Ijv © A)A + (Ijy A)B) 
= det(l A r + (I Ar ©A)B) 

where (a) is a direct consequence of Oppenheim's inequality [37, 
Theorem 7.8.6]. To conclude the proof, we remove the restric- 
tion that A has only nonzero diagonal entries. Because A is 
nonnegative definite, its ith row and its ith column are zero 
if [A]u — [37, Section 7.1], so that, by the definition of the 
Hadamard product, the ith row and the ith column of A B 
are zero as well. Let X be the set that contains all indices i for 
which [A]u = 0, assume without loss of generality that there 
are L such indices, and let Aj and Bj denote the submatrices 
of A and B, respectively, with all rows and columns correspond- 
ing to X removed. An expansion by minors of det(I ^+A0B) 
now shows that 



det(Ijv + A©B) =det(I L + Ai©B x ). 



(27) 



Hence, it suffices to apply the inequality po} to the RHS of d27 



Appendix B 
Optimization of the Upper Bound 



The expression to be maximized in ( 15a i, 

, , A ^Y w i x pTF 



TF 



W 



aG r {W) 



where G r {W) is given in (15b I, is concave in a. Hence, the 



optimizing parameter a is unique. Furthermore, the following 
two properties hold: (i) g(a) = for a = 0. (ii) As, by Jensen's 
inequality and because log(l + x) < x 



G r (W) < 



WAj 
WA* 



log( 
, log( 



1 



the first derivative of g(a), 



M n 



</(«)= E 



WA n 



X r P 



< X r P, 



1 + a\ r PTF/W 



G r (W) 



(28) 



(29) 



is nonnegative at a = 0. 

From property (i) and (ii), and from the concavity of g(a), it 
follows that the supremum in ( |15a| ) is achieved for a = a® if 
and only if the zero of ( |29] i occurs at a point larger or equal to Uo, 
or, equivalently, if and only if |29| is positive for a S [0, <tq). 
Identification of this zero-crossing is difficult for rank(B) > 1, 
but we can obtain a sufficient condition for the supremum to be 
achieved for a = <tq as follows: 



The first derivative ( 29 1 will certainly be positive if all terms 



in the sum are positive. 

As for all a in the set [0, Co) the inequality 

\ r P \ r P 



1 



> 



a\ r PTF/W ~ 1 + a a \ r PTF/W 

holds, it follows from Jensen's inequality applied to G r (W) 
as in ( f28| i that a sufficient condition for the rth term in ( f29| ) 
to be positive in [0, ctq) is 



\ r P 



> 



WA M 



log( 



1 



l + a \ r PTF/W a (3 °V WA r _ 

This condition is very similar to one analyzed in [19, 
Appendix C], and steps identical to the ones detailed in [19, 



Appendix C] finally lead to ( 19 1 
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